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Abstract 

A common algorithm for the computation of eigenvalues of real symmetric 
tridiagonal matrices is the iteration of certain special maps F a called shifted 
QR steps. Such maps preserve spectrum and a natural common domain is 
7a, the manifold of real symmetric tridiagonal matrices conjugate to the 
diagonal matrix A. More precisely, a (generic) shift s £ R defines a map 
F s : 7a — > 7a. A strategy a : 7a — > K specifies the shift to be applied at T 
so that F a (T) = F a (T)(T). Good shift strategies should lead to fast defla- 
tion: some off-diagonal coordinate tends to zero, allowing for reducing of the 
problem to submatrices. For topological reasons, continuous shift strategies 
do not obtain fast deflation; many standard strategies are indeed discontin- 
uous. Practical implementation only gives rise systematically to bottom de- 
flation, convergence to zero of the lowest off-diagonal entry b(T). For most 
shift strategies, convergence to zero of b(T) is cubic, \b(F a (T))\ = &(\b(T)\ k ) 
for k — 3. The existence of arithmetic progressions in the spectrum of T 
sometimes implies instead quadratic convergence, k — 2. The complete in- 
tegrability of the Toda lattice and the dynamics at non-smooth points are 
central to our discussion. The text does not assume knowledge of numerical 
linear algebra. 
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1 Introduction 

In this paper, we study some subtle dynamical aspects of a class of numerical 
algorithms for eigenvalues of real symmetric matrices. This includes the classic 
inverse iteration with different shift strategies, among which Rayleigh and Wilkinson 
shifts. We do not assume previous knowledge of numerical linear algebra. 

Numerical analysts are familiar with tridiagonalization, the fact that given a 
real symmetric matrix 5* it is easy to obtain another isospectral matrix T which is 
tridiagonal: (T)ij = whenever \i — j\ > 1. For matrices of order approximately 
between 20 and 1000, it pays to first tridiagonalize and then work in the vector 
space T of real symmetric tridiagonal matrices. Let A = diag(Ai < A2 < ■ • • < A n ) 
be a diagonal matrix with simple spectrum: it turns out that the set 7a C T of 
tridiagonal matrices isospectral with A is a connected compact smooth oriented 
manifold (|16). [8]). The algorithms under consideration are defined by iteration of 
some easily computable map F : 7a — > 7a: given T g 7a we consider the sequence 
(F k (To)). For relevant maps F, diagonal matrices in 7a are fixed points of F, 

Let £ C 0(n) be the group of real orthogonal diagonal matrices, so that for 
E e (E)u = ±1. For each E € S , the map 77 : 7a -> 7a, v(T) = ETE, is an 
involutive diffeomorphism of 7a: its effect on T e 7a is to change signs of some 



subdiagonal entries (T^+i^. Numerical analysts, familiar with this simple fact, 
often drop signs of subdiagonal entries. We shall not do likewise for we are often 
interested in smoothness issues. Again, relevant maps will be (£-)equivariant, in 
the sense that F o rj — r\ o F for all r\. 

Consistently with the involutions above, signs of subdiagonal entries induce a 
cell decomposition of 7a- The 0-cells are the n! diagonal matrices and the top 
dimensional (n — l)-cells turn out to be 2 n ~ 1 permutohedra (polytopes equivalent 
to the convex hull of the nl points of R™ obtained by permuting n fixed distinct real 
numbers). For n = 3, the manifold 7a is a bitorus which can be obtained by gluing 
four hexagons along six circles (see Figure [TJ. 




Figure 1: The cell decomposition of 7a for A = diag(4, 5, 7) 

We consider that iteration of the map F has accomplished its job when one 
subdiagonal entry (F fc (T))j +lji has absolute value smaller than some prescribed 
tolerance. In terms of the cell decomposition, we are done when we hit (a thin 
neighborhood of) a lower dimensional cell, or, in the numerical jargon, the sequence 
(F k (T)) undergoes deflation. Notice that if (T)i + i^ = then the matrix T splits as 
T = T a © T b where the tridiagonal submatrices T a and T b have orders i and n — i, 
respectively. Pragmatically, if (T)i + \^ « then the spectrum of T is approximately 
the disjoint union of the spectra of T a and T b , which are easier to compute. 

Ideally, deflation should happen approximately in the middle so that each sub- 
problem has order approximately half of the original one. Unfortunately, it is not 
known how to implement easily computable iterations with this property. Usually 
the sequence (F k (T)) undergoes bottom deflation: 

hm b(F k (T))=0; b{T) = (T) n , n _i. 

k— >-\-oo 

Geometrically, we approach one of the n deflation sets T>\ C 7a defined by 
{T) n ,n = Aj, b(T) = 0. 

Notice that removing the n-th row and column obtains a diffeomorphism: 

^a,o ~ Ta,, A, = diag(Ai, . . . , Xi-i, Ai+i, • • • , A„); 

in particular, T>\ is connected. In Figure Q] the submanifolds T>\ are three of 
the six (removed) circles. It turns out (Proposition 14. 2[) that, for sufficiently small 
e > 0, the closed set T>\ e C 7a defined by \b(T)\ < e has n connected components 
V\ t which are closed tubular neighborhoods of T>\ . 
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As algorithms for eigenvalue computation, continuous maps F are problematic. 

Theorem 1 Let F : 7a — ► 7a be a continuous E-equivariant map such that every 
diagonal matrix in 7a is a fixed point of F. 

(a) The map F is surjective. 

(b) If there exist disjoint compact sets /Cj D with F()Ci) C int(/Ci) i/ien there 
exists T G 7a /or which the sequence (F k (T)) does not undergo bottom deflation. 

Item (a) already makes F unpromising as an algorithm: for any k, the iterate F k 
is surjective and, given k, there exists T such that F k (T) is far from deflation. The 
additional hypothesis in item (b), which, as we shall see, holds for many algorithms, 
makes F even less desirable. The proof of this result uses methods very different 
from the rest of the paper and is left for the Appendix. 

These phenomena lead numerical analysts to consider discontinuous maps F. 
Among the standard algorithms to compute eigenvalues of matrices in 7~ are QR 
steps with different shift strategies: Rayleigh and Wilkinson are familiar examples 
(excellent references are [17], [5], [S|). Recall that Rayleigh's strategy p is contin- 
uous and is known to have the unfortunate property (b) that there exists a matrix 
T for which (F k (T)) does not undergo bottom deflation; Wilkinson, on the other 
hand, is discontinuous. In this paper, we consider a more general context: we define 
simple shift strategies, which include the examples above and more. 

More precisely, given a matrix T GT and s £ R, write T — si — QR, if possible, 
for an orthogonal matrix Q and an upper triangular matrix R with positive diagonal 
entries. A shifted QR step is $(T, s) = Q*TQ. As is well known, shifted QR 
steps preserve spectrum and shape. A function a : 7a — * K is (S-)invariant if 
o-(r?(T)) = a(ETE) = a(T) for all T £ 7a and all E £ E. A simple shift strategy 
is an invariant function a : 7a K satisfying the following condition: there exists 
C a > such that for all T € 7a there is an eigenvalue Xi with \a(T) — Xi\ < C a \b(T)\. 

For technical reasons, we prefer the signed step F S (T) — $*(T, s) = Q+TQ+, 
where now T — si = Q+R+, the orthogonal matrix has positive determinant and 
only the first n— 1 diagonal entries of the upper triangular matrix i?* are required to 
be positive. As we shall see, the signed step is smoothly defined on a larger domain, 
and convergence issues for both kinds of step iterations are essentially equivalent. 

Simple shift strategies prescribe shifts: set F a {T) = F^f^^T). It turns out that 
F a is a well-defined (but usually discontinuous) equivariant map from 7a (or some 
very large subset thereof) to 7a . 

An important question in practice is estimating the rate of deflation, i.e., the 
rate of convergence to zero of the sequence b(F k (T)). Numerical evidence indicates 
that deflation is often cubic, in the sense that there is a constant C such that 
\b{F k+1 (T))\ < C\b(F k (T))\ 3 for large k. 

Consider the singular support S a C 7a of a shift strategy a, the minimal closed 
subset of 7a on whose complement a is smooth. Away from the singular support 
S a , squeezing is cubic. 

Theorem 2 For e > small enough, each deflation neighborhood T> l A is invariant 
under F„. There exists C > such that, for all T G X> A , e , \b(F a (T)j\ < C\b(T)\ 2 . 
Also, given a compact set K, C 2?A,e disjoint from S a (1 2?a.0i there exists Ck: > 
such that, for all T G K, |6(F CT (T))| < C K \b(T)\ z . 

Although the tubular neighborhoods "D\ £ are invariant under F a , it is not true 
in general that F k (T) belongs to T> l A for sufficiently large k: this is true, however, 
for the important example of Wilkinson's shift. 
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For Rayleigh's shift, it is well known that convergence (when it happens) is 
always cubic; this is a corollary of Theorem [2] Cubic convergence does not hold 
in general for Wilkinson's strategy. In [5], for A = diag(— 1, 0, 1), we construct 
a Cantor-like set X C 7a of unreduced initial conditions for which the rate of 
convergence is strictly quadratic. Sequences starting at X converge to a reduced 
matrix which is not diagonal. A part of the set X is shown (true to scale) on the 
left part of Figure [2j the Cantor-like aspect is invisible: the cross section of each 
of the four visible "curves" really consists of a tiny Cantor set, far smaller than 
the resolution of the picture. This is consistent with the fact that the Hausdorff 
dimension of A" is 1. The set X is the intersection of thinner and thinner wedges; in 
the right half we show schematically three generations of such wedges. The central 
vertical line is a step-like discontinuity; the inverse image of the largest wedge Xq 
is x[ +) U x[~^ and the inverse image of that is X^ ++) U Af 2 (+_) U x!f +) U X.^ K 




Figure 2: The set X 

As numerical analysts know, shift strategies usually define sequences of matrices 
which, asymptotically, not only isolate an eigenvalue at the (n, n) position but also 
isolate, at a slower rate, a second eigenvalue at the (n — l,n — 1) position. This 
does not happen for the example above where (F^(T))„.„ tends to the center of a 
three-term arithmetic progression of eigenvalues and (Fj?(T)) n _i,„_2 stays bounded 
away from zero. 

A matrix T G T with simple spectrum is a. p. free if it does not have three eigen- 
values in arithmetic progression and a.p. otherwise; in particular, generic spectra 
are a.p. free. In this case, the situation is very nice: cubic convergence is essen- 
tially uniform on 7a- This condition is reminiscent of the Sternberg's resonance 
hypothesis for normal forms (|15j). 

Theorem 3 Let A be an a.p. free matrix and a a shift strategy for which diagonal 
matrices do not belong to S a . Then there exist e > 0, C > and K > such that: 

(a) the deflation neighborhood 2?A,e is invariant under F a ; 

(b) for any T £ T)^^, the sequence (F^(T)) converges to a diagonal matrix and the 
set of positive integers k for which \b(F% +1 (T))\ > C\b(F£ (T))\ 3 has at most K 
elements. 

Still, the finite set of points in which the cubic estimate does not hold may occur 
arbitrarily late along the sequence (F*(T)). 

An a.p. matrix is strong a.p. if it contains three consecutive eigenvalues in arith- 
metic progression and weak a.p. otherwise. Under very mild additional hypothesis, 
b(T) converges to zero at a cubic rate also for weak a.p. matrices. Let Ca,o C 7a be 
the set of matrices T for which (T) ni „_x = {T) n -i, n -2 = 0. 

Theorem 4 Let A be a weak a.p. matrix and u : 7a — > R a shift strategy for which 
Ca,o and S a are disjoint. Then there exists e > such that the deflation neigh- 
borhood pA.e is invariant under F a and, for all unreduced T £ T>A,e, the sequence 
(b(F£(T))) converges to zero at a rate which is at least cubic. More precisely, for 
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each unreduced T £ T>\ e there exist C't, Kt > such that, for all k > Kt, we have 
\b(F^{T))\<C T \b{F*{T))\\ 

In particular, the convergence of Wilkinson's strategy is cubic for weak a. p. 
matrices. However, uniformity in the sense of Theorem [3] is not guaranteed and the 
constants Ct and Kt depend on T. As in the case of the spectrum {—1, 0, 1}, we 
conjecture that if A is strong a. p. then there exists X C 7a of Hausdorff codimension 

1 of initial conditions T for which the rate of convergence is strictly quadratic. 

The celebrated integrability of the Toda lattice ([6], [13]) on unreduced tridiag- 
onal matrices manifests itself in several ways along the paper: it provided ample 
inspiration but the paper strives to be self-contained. For starters, the steps F s , 
s £ K, commute in their natural domains (Proposition 12 .6] ) . Norming constants (as 
in [T3]) provide angle variables for which steps F s are translations. Unfortunately, 
these angle variables break down (as they must!) for reduced matrices T G 7a- 
Since (F^(T)) approaches reduced matrices we prefer to introduce other coordinate 
systems which extend smoothly to such points. Bidiagonal coordinates, defined in 
[5], consist of very explicit charts on the manifold 7a- They are used in [5] to prove 
the cubic convergence of Rayleigh's shift and in the unpublished manuscript [TU] 
to prove some of the results presented here for Wilkinson's shift. In Section 4, in- 
stead, we introduce tubular coordinates on the tubular neighborhoods T>\ e : steps 
F s within these sets are given by a very simple formula (Corollary 14. 3|) . 

The (signed) steps F a are smooth whenever the shift strategy is, i.e., for T € 
T>\ € \ S a ("unsigned steps" would not be smooth on limit points). At matrices 
T e V\ q on which F a is smooth, the map T i-> b(F a (T)) has zero gradient. The 
symmetry of the shift strategy yields a cubic Taylor expansion and therefore an 
estimate \b(F a (T))\ < C\b(T)\ 3 , settling Theorem!! 

Height functions H : T>\ e — ¥ K (similar to Lyapunov functions) are used for 
further study of the sequence (F*(T)) in the a. p. free case. More precisely, for steps 
s near Xi, Hi(F s (T)) > Hi(T) provided T g D\ t is not diagonal: this is another 
manifestation of the Toda dynamics. Theorem |3] then follows by a compactness 
argument bounding the number of iterations for which F*(T) stays close to the 
singular support S a . 

For a. p. spectra the situation is subtler, as can be seen from the example in [9] 
and Figure[2] On the other hand, Theorem |4] tells us that the weak a. p. hypothesis 
together with an appropriate smoothness condition guarantee cubic convergence. 

In Section 2 we list the basic properties of the signed shifted QR step on the 
manifold 7a- Simple shift strategies are introduced in Section 3, and the standard 
examples are shown to satisfy the definition. We define the deflation set X>a,o and 
neighborhood T>A,e in Section 4 and then set up tubular coordinates. The local 
theory of steps F s near 2?a,o and the proof Theorem [2] are presented in Section 5. 
In Section 6 we construct the height functions H and then prove Theorem [3] The 
convergence properties for a. p. matrices in Theorem |4] are proved in Section 7. We 
present in Section 8 two counterexamples to natural but incorrect strengthenings 
of Theorems [3] and |3J Finally, the Appendix is dedicated to Theorem [1] 

The authors are very grateful for the abundant contributions of several readers of 
this work and its previous versions. The authors acknowledge support from CNPq, 
CAPES, INCT-Mat and FAPERJ. 

2 The manifold 7a and shifted steps F s 

Let T denote the real vector space ofnxn real, symmetric, tridiagonal matrices 
endowed with the norm ||T|| 2 = tr(T 2 ). For T G T, the subdiagonal entries of T are 
(T) i+ i, s ; for i = 1, . . . , n — 1. The lowest subdiagonal entry of T is b(T) = (T) n>n _i. 
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As usual, let SO(n) denote the set of orthogonal matrices with determinant 
equal to 1. Let A be a real diagonal matrix with simple eigenvalues Ai < ••• < A n . 
Define the isospectral manifold 

T A = {Q*aq,Q e so{n)}nr, 

the set of matrices in T similar to A. The set 7a C T is a real smooth manifold 
([IB]; [8] describes an explicit atlas of 7a)- 

For a matrix M, the QR factorization is M = QR for an orthogonal matrix Q 
and an upper triangular matrix R with positive diagonal. The Q^i?* factorization, 
instead, is M = Q+R+, for Q+ € SO(n) and R+ an upper triangular matrix with 
> 0, i = 1, . . . , n — 1. A real n x n matrix M is almost invertible if its first 
n — 1 columns are linearly independent: notice that almost invertible matrices are 
dense within n x n matrices and form an open set. The diagonal matrix E n -\ is 
such that (E n -i)i^ is 1 for i < n and —1 for i = n. 

Proposition 2.1 An almost invertible real matrix M admits a unique fac- 
torization, with and i?+ depending smoothly on M . If M is invertible, it admits 
unique (smooth) factorizations M = QR = Q+R+. If det M > 0, the factorizations 
are equal, i.e., Q = Q* and R = R*. If det M < 0, Q = Q*-E„_i and R = En-iR*. 
7/detM = } (/**)„,„ = 0. 

Proof: Let M be almost invertible. Applying Gram-Schmidt with positive normal- 
izations on its first n — 1 columns we obtain the first n — 1 columns of both Q and 
R, as well as those of and R+. The last column v — Qi,e n of Q* is already well 
defined, by orthonormality and the fact that detQ* = 1. Now, set i?* = M(Q+)* . 
The positivity of R n ^ n specifics whether the last column of Q is v or —v. Smoothness 
is clear by construction. 

If M is invertible, det M = det det i?* implies that the last diagonal entry of 
i?* has the same sign of det M: the relations between the factorizations then follow. 
If M is not invertible, the relation among determinants implies (ii*)n,« = 0. I 

If all subdiagonal entries of T are nonzero, T is an unreduced matrix; otherwise, 
T is reduced. Notice that an unreduced tridiagonal matrix is almost invertible: 
indeed, the block formed by rows 2, . . . , n and columns 1, . . . , n — 1 is a an upper 
triangular matrix with nonzero diagonal entries, and therefore, invertible. 

We consider the shifted QR step and its signed counterpart, 

$(T, s) = Q*TQ, $*(T, s) = QITQ+, 

where T - si = QR and T - si = Q+R+. Let Dom($) be the set of pairs (T, s) € 
Txl for which T — si is invertible: Dom($) is open and dense inTxM and, from 
the Gram-Schmidt algorithm, <E> is smooth in Dom($). Similarly, the above proof 
shows that is smooth in Dom(<I>*), with (T, s) € Dom($ t ) if T — si is almost 
invertible. Clearly, Dom($) is strictly contained in Dom($*). 

Lemma 2.2 For (T, s) € Dom($) (resp. Dom($*)j, we /uwe $(T, s) € 7" (Vesp. 
$*(T, s) € TJ. TTie spectra ofT, $(T, s) and $*(T, s) are equal. In the appropriate 
domains, for T — si = QR = and j = 1, 2, . . . , n — 1, 

($(T, s)) l+M = I^±I (T). i+M , (<MT, s)) J+M = (i ^' +M+1 PW 

Thus, the top n — 2 subdiagonal entries of T , $>{T, s) and $*(T, s) /iawe t/ie same 
sic/n; a/so, sign(T)„ in _i = sign ($(T, s)) n ,„_i. 
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Proof: We prove the statements for $*; the others are then easy. 

For a pair (T, s) € Dom($) C Dom($ t ), there are two expressions for $*(T, s): 

$*(T,s) = QtTQ* = R*TR- 1 , where T - si = Q*R*. 

From the first equality, $*(T, s) is symmetric and from the second, 3>*(T, s) is an 
upper Hessenberg matrix so that < J ) *(T, s) € T is similar to T. More generally, for 
(T, s) e Dom($ t ) we still have 

and therefore $*(T, s) e T is similar to T. Compute the entry of the second 

equation above to obtain ($*(T, s)) i+M (-R*)j,; = (iJ*)i+i,t+i (T) i+1><) completing 
the proof. ■ 

The following result describes the behavior of at points not in Dom($), which 
will play an important role throughout the paper. 

Lemma 2.3 If (T, s) £ Dom(<I>*) \ Dom($) then 

6($*(T, a)) = ($*(T, s))„,„_i = 0, (**(T, «))„,„ = s. 

j4t a point (T, s) £ Dom($+) «ra£/i 6(T) = and s = (T) n)Tl we /lave grad(&o$ + ) = 0. 

Proof: Since T — si = Q*i?* = is not invertible then (i?*) rl .„ = and 

therefore i?*e„ = 0. Thus u = (Ql)~ 1 e n — Qe n satisfies (T — sl)v — 0. We then 
have $*(T, s)e„ = Q*TQe n = Q*Tv = Q*(sv) = se n , proving the first claim. For 
the second claim, since T — si is almost invertible, j > for i < n. From the 

previous lemma, 

(6 o .Jf k " h(T); 

if 6(T) = and s = (T)„.„ then (i?*) niTi = and 6 o is a product of two smooth 
functions, both zero, yielding grad(6o =0. ■ 

The operation of changing subdiagonal signs, i.e., of conjugation by some E € £, 
behaves well with respect to $ and $*. For 1 < j < n, let £ £ be defined by 

Together with —I, the matrices £j generate £ . For T £ T, m{T) = E{FEi differs 
from T only in the sign of the z-th subdiagonal coordinate: (?jj(T))j_|_x,i = — 
The nontrivial involutions in 7a are therefore generated by rji, 1 < i < n. 

Lemma 2.4 The domains Dom($) and Dom(<i> i ) are £ -invariant and 

*fa(T), s) = r/($(T, a)), ^( V (T), s) = r?(^(T, «)). 

// det(T - si) > i/iera $(T,s) = $*(T,s); z/ det(T - s7) < 0, $(T,s) = 
?7„_i($^(r, s)); z/det(T - si) = and (T, s) € Dom($ t ), tfien b($*(T, s)) = 0. 

Proof: For (T, s) £ Dom(<J>), the matrices T — si and E(T — sI)E are both invert- 
ible. The QR factorization T - si = Qi? yields ETE - E(sI)E = (EQE)(ERE), 
preserving the positivity of the diagonal entries of the triangular part, so 

<f>(ETE, s) = (EQE)* ETE(EQE) = EQ*TQE = E<P(T, s)E. 

The argument is similar for The claims for T — si invertible follow from the 
relation between Q and Q+ in Proposition ^. II the case dct(T—sI) = is a repetition 
of Lemma [ 
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We are only interested in the case when the spectrum of T is simple, since 
a double eigenvalue implies reducibility. Since either version of shifted QR step 
preserves spectrum, restriction defines smooth maps <f> : (7a xl)fl Dom($) — > 7a 
and : (7a x R) n Dom($*) -> 7a- 

Still in 7a, it is convenient to consider the step F S (T) = ( &*(T, s). For s not an 
eigenvalue of A, the domain of F s is 7a- The natural domain for F\ i instead is the 
deflation domain T>\, the open dense subset of 7a of matrices T for which T — Aj7 
is almost invertible. In other words, T € T>\ if and only if \ is an eigenvalue of the 
lowest irreducible block of T. 

The definition of the step F s differs from the usual one in that we use <I>* instead 
of $. Given LemmaEH considerations about deflation are unaffected and our choice 
has the advantage of being smooth (and well defined) in T>\. 

The (i-th) deflation set is 

^X,o = { T e T A I b{T) = 0, (T) n ,„ = A J . 

Since the spectrum of A is simple, T>\ C 2? A . Also, if i j then 2? A n 2? A = 0. 
We saw in Lemma POl that when the shift is taken to be an eigenvalue, a single step 
deflates a matrix, i.e., that the image of F\ { is contained in T>\ : we shall see in 
Proposition 12.51 that this image is in fact equal to T>\ . 

Proposition 2.5 If s is not an eigenvalue of A, the map F s : 7a 7a is a 

diffeomorphism. The image of F\ t : T>\ — > 7a is 2?\o- ^ e restriction i^xju* o : 
T>\ — > P A is a diffeomorphism. 

Proof: If s is not an eigenvalue, compute F~ 1 (T) by factoring T — si as i? 
upper triangular with the first n — 1 diagonal entries positive and Q £ SO(n): we 
claim that F s (7o) = T for To = Qi? + si, proving that F s is a diffeomorphism. 
Indeed, Qi? = T - si is a <2*-R* factorization and thus F S (T ) = Q*T Q = T. 

From the last sentence of Section 2, the image of F\ t is contained in T>\ C T>\. 
The fact that the restriction of F\ t to T>\ is a diffeomorphism is proved as in the 
previous paragraph. ■ 

Commutativity of steps is well known and related to the complete integrability 
of the interpolating Toda flows ([5], [TT], [TJ], [T3]). For the reader's convenience 
we provide a quick proof. 

Proposition 2.6 Steps commute: F Sa oF Sl — F Sl oF So in the appropriate domains. 

The domain of F so o F S1 — F S1 o F so is 7a if neither s nor s i is an eigenvalue, 
T>\ if s = Xi and Si is not an eigenvalue (or vice- versa) and the empty set in the 
rather pointless case so = Xi, s\ = Xj, i =/= j. 

Proof: We prove commutativity only when so and si are not eigenvalues; the other 
cases follow easily. Consider factorizations 

T-s I = Q R , T-s 1 I = Q 1 R 1 , 
(T - s Q I){T - sil) = (T — Sl I)(T - s 7) = Q 2 i? 2 . 

For F S0 {T) - sx = Q* {T - Sl )Q - Q 3 R 3 , we have F Sl {F So {T)) = Q* 3 F So {T)Q 3 = 
Q* 3 QoTQoQ 3 - Thus 

Q* Q (T - Sl )Q a R Q = Q*(T - Sl I)(T - s I) = Q* Q 2 R 2 - Q 3 R 3 R 
and therefore Q* Q 2 = Qs and F So (F Sl (T)) = Q* 2 TQ 2 . ■ 
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3 Simple shift strategies 



The point of using a shift strategy is to accelerate deflation, ideally by choosing s 
near an eigenvalue of T. A simple shift strategy is an £ -invariant function a : 7a — > K 
such that there exists C a > such that for all T £ 7a there is an eigenvalue Aj with 
k(T) - Ai| < C CT |&(T)|. In particular, if T € P A then o-(T) = Aj. 

The step associated with a (simple) shift strategy a is F a , defined by F a (T) = 
F a (T){T). The natural domain for F a is the set of matrices T for which T — 
cr(T)I is almost invertible. From Section 2, it includes all unreduced matrices and 
open neighborhoods of each deflation set V\ . We shall also see in Section 6 that 
it contains a dense open subset of 7a invariant under F a . A more careful 
description of this domain will not be needed. 

Quoting Parlett ([H]), there are shifts for all seasons. Let p be Rayleigh's shift: 
p(T) — (T) n n . Denote the bottom 2x2 diagonal principal minor of a matrix T e T 
by T: Wilkinson's shift uj(T) is the eigenvalue of T closer to (T) n)tl (in case of draw, 
take the smallest eigenvalue). 

Lemma 3.1 The functions p and uj is are simple shift strategies with C p = y/2 and 

C u = 2V2. 

We use here the Wielandt-Hoffman theorem (for a simple proof using the Toda 
dynamics, see [4]): if S, T G T have eigenvalues o\ and A^ in increasing order then 

^k,- A,| 2 <tr((5-T) 2 ). 

i 

Proof: Invariance is trivial for p; for u, it follows from the fact that changing signs 
of off-diagonal entries of a 2 x 2 matrix does not change its spectrum. 

Let B = e n e* n _ 1 + e„_ie* and S = T — b(T)B so that p(T) = {T) n , n is an 
eigenvalue of S. From the Wielandt-Hoffman theorem, for some i, 

\p(T)-Xi\<V2\b(T)\, 

proving that C p = \/2. Apply again the Wielandt-Hoffman theorem to the 2x2 
trailing principal minors of S and T to deduce that 

\(T) n>n -cj{T)\ < V2 \b(T)\. 

We thus have \oj(T) — A*| < 2y/2 \b(T)\ and C w = 2y/2, as desired. ■ 

Another example of (simple) shift strategy, the mixed Wilkinson- Rayleigh strat- 
egy, uses Wilkinson's shift unless the matrix is already near deflation, in which case 
we use Rayleigh's: 

a(T ) = f P W> K T )n,n-l\ < e, 

\w(T), KT^.il^e; 

here e > is a small constant. 

Simple shift strategies are not required to be continuous and u> is definitely 
not. For a simple shift strategy a, let S a C 7a be the singular support of er, i.e., 
a minimal closed set on whose complement a is smooth. For example, Su, is the 
set of matrices T € 7a for which the two eigenvalues w_(T) and cj+{T) of T are 
equidistant from (T) nj „, or, equivalently, for which (T) n>n = (r)„-i,n-i. The set 
S a will play an important role later. 

We consider the phase portrait of F u for 3 x 3 matrices. In this case, the reader 
may check that the domain of F^ is the full set 7a- Let J7a C 7a be set of Jacobi 
matrices similar to A, i.e., matrices T € 7a with strictly positive subdiagonal entries. 
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Recall that the closure Sa C 7a is diffeomorphic to a hexagon, the permutohedron 
in this dimension. The set Sa is not invariant under F u but we may define F U (T) 
with F u : Ja^ J a by dropping signs of subdiagonal entries of F u (T) . As discussed 
above, this standard procedure is mostly harmless. 

Two examples of F u are given in Figure [3l which represent Sa for the A = 
diag(l, 2, 4) on the left and A = diag(— 1, 0, 1) on the right. The vertices are the six 
diagonal matrices similar to A and the edges consist of reduced matrices. Labels 
indicate the diagonal entries of the corresponding matrices. Three edges form T>\ n 
Sa- they alternate, starting from the bottom horizontal edge on both hexagons. The 
set S n Sa is indicated in both cases. 




Figure 3: The phase space of Wilkinson's step for n = 3. 

Vertices are fixed points of F u and boundary edges are invariant sets. A simple 
arrow indicates the motion of the points F*(T) along the edge. Points T on an arc 
with a double arrow are taken to a diagonal matrix in a single step: the arc points 
to F U (T). Arcs marked with a transversal segment consist of fixed points of F u . 

Points on both sides of S u are taken far apart: there is a jump discontinuity 
along S u . From Theorem [21 the decay of the bottom subdiagonal entry under 
Wilkinson's step away from n T>a.o is cubic. As discussed in [9|, near 5^ fl T>a,o 
this decay is quadratic, but not cubic. For the left hexagon, cubic convergence 
occurs in the long run because the sequence F%(T) stays close to this intersection 
only for a few values of k, illustrating Theorem |3l 

In the case A = diag(— 1, 0, 1), the bottom edge consists of fixed points. As 
mentioned in the Introduction (see Figure [2]) , this case has a special asymptotic 
behavior: the (fixed) point labeled by (0,0,0) is the central point of the set X. If 
T € X then the sequence (F*(T)) is contained in X and converges to the central 
point at a strictly quadratic rate. 

4 Tubular coordinates 

Recall that a map LT : A — > Y C I is a projection if H(X) = Y and II o II = II. 
Instead of using abstract topological facts to prove the existence of some projection 
T> l A — > T> A q we prefer to construct a specific projection which works well with the 
QR steps. The map F\ t : T>\ — >■ T>\ is not a projection but can, using Proposition 
12. 5( be used to define one: the canonical projection Hi : T>\ —¥ T>\ , 

U l (T) = (F Xt \ VXg r\F Xi (T)). 

Proposition 4.1 The map Hi is a smooth projection which commutes with steps: 
Hi(F s (T)) — F s (Tli(T)) provided s is not an eigenvalue of A different from A,. 

Proof: The map IT is clearly smooth and, for T € T>\ , we have 
n i (T) = (F Aj | 7 , io )- 1 (F Ai (T))=T, 
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proving that II; is a projection. Commutativity follows from Proposition ^. 61 I 

For a diagonal matrix A with simple spectrum and e > 0, the deflation neigh- 
borhood 2?A,e C 7a is the closed set of matrices T e 7a with \b(T)\ < e. This 
notation is consistent with 2?a,o for the deflation set. As we shall see in Proposi- 
tions H2] and EHJ for sufficiently small e > the set X>a,c has connected components 
T>\ e C T>\ , T>\ e D 2? A , which are invariant under steps F s for shifts s near A, , 
i.e., F S (D\ J C 2? A e . The sets 2?\ e are therefore also invariant under F„. 

Denote the distance between a matrix T and a compact set of matrices Af by 
dist(T,A/") = mmg e _\f \\T — S\\. Let 7 = min,^- |Aj — Xj\ be the spectral gap of A 
and B = e n e* n _ x + e„_ie*. 

Recall that if M is a submanifold of codimension k of .M then a closed tubular 
neighborhood of A/" consists of a closed neighborhood Af e of A/" and a diffeomorphism 
C : K -> AT x B* with CO) = (x, 0) for x G A/" (here C R fc is the closed ball 
of radius e around the origin). Given x G Af, the preimage £ _1 ({x} x B(f) is a 
manifold with boundary of dimension k, the fiber through x. We now construct 
tubular neighborhoods of the deflation sets T>\ ; here the codimension is k = 1. 

Proposition 4.2 i?ac/i T>\ C 7a is a compact submanifold of codimension 1 dif- 
feomorphic to 7a 4 , where A, = diag(Ai, . . . , Aj_i, A;+i, . . . A n ). There exists etub > 
smc/i i/ia£ /or e G (0, etub)-' 

('a) the connected components T>\ ofT>^ e consist of matrices T € 2?A,e /or which 
\(T) n ,n - Aj| < V2e; 

(7>j tfte map £ : P A _ e -> P A0 x [-e,e] ffiuen C( T ) = ( u i( T ),K T )) is a cl( >sed 
tubular neighborhood 0/£>a,O>" 

fc) i/iere is a constant C'b > smc/i </iai /or aZZ T G 2?\ e , 

|6(T)| < dist(T,2?X i0 ) < ||T*-n i (r)|| < C b \b(T)\. 

Proof: We first show that the gradient of the restriction b\j- A at a point T-p G I>a,o 
is not zero. Consider the characteristic polynomial along the line T-p + tB: this is 
a smooth even function of t and therefore B is tangent to 7a at T-p, the point on 
which t = 0. On the other hand, the directional derivative of 6 along the same line 
equals 1. Thus Pa,o C 7a is a submanifold of codimension 1. The diffeomorphism 
with 7a; takes T to T, the leading (n — 1) X (n — 1) principal minor of T. 

Assume e < j/(2y/f). Consider matrices T G X>A,e and S = T - b(T)B, so that 
(T) n ,n is an eigenvalue of 5. By the Wielandt-Hoffman theorem, there exists an 
index i for which |(T)„ iT1 — Aj| < V2e, defining the sets T>\ e (at this point we do 
not yet know that T>\ e is connected) . 

For T-p G £> A0 , the derivative DH^T-p) equals the identity on the subspace 
tangent to T>\ Q and has a kernel of dimension 1. Thus, for sufficiently small et u b 7 
item (b) holds. This also proves that each T>\ t is connected, completing the proof 
of item (a). 

The first two inequalities in (c) are trivial. Now 

\\t - n,(T)|| - \\c\MnKT)) - c _1 (n<(r),o)|| < c b \b(T)\, 

where the derivative of £ -1 (Tx>, S) with respect to the second coordinate is bounded 
by Ct on the compact set 2?a,o x [—etub, etub]- B 

The diffeomeorphism £ defines tubular coordinates for T G P Ae ; the matrix 
ilj(r) G T>\ q w 7a.; and b(T). Under tubular coordinates, QR steps with shift are 
given by a simple formula. 
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Corollary 4.3 Consider A, i and e g (0, etub)- TTien 

C o F s o r 1 : 2? A , x [-e, e] -> Z> A>0 x [-e, e] 

(T,6)^(f s (T) ft) 

w/iere e _1 ( T > 6) - s7 = Q±R±. 

Proof: This follows directly from Lemma 12.21 and Propositions 14.11 and 14.21 ■ 

5 Convergence to deflation 

Sufficiently thin deflation neighborhoods T>\ are invariant under F s for s w A; . 

Proposition 5.1 Given C > 0, there exists e mv G (0,et u b) ; such that for any 
e G (0, e; nv ) and s G [Xi — C e, A; + C e] we have F s (V l A J C int(2? A , 2 ). 

For a simple shift strategy a : 7a — > K, t/iere exists e- mv > swc/i i/iai if e £ 
(0,e inv ) then F CT (2?\ e ) c int(D\ e/2 ). 

In particular, F s is well defined in X>\ ( for e £ (0, e; nv ). 
Proof: Recall that F S (D\ ) = T>\ . From Lemma f2.3[ the derivative of bo is 
zero at X> A x {A^}. Compactness of T>\ thus implies that in a sufficiently small 
neighborhood of V\ fi x {A*} we have |6(F a (T))| < |6(T)|/3. 

Now consider a simple shift strategy a: there exists C„ > such that |<x(T) — 
Ai| < C a b(T); apply the first statement with C — C a . ■ 

Thus, F a squeezes neighborhoods T>\ e at least linearly. Equivariance and 
smoothness imply an estimate stronger than that in the definition of simple shift 
strategy. We do not want to assume, however, that £>a,o H <S<x = 0: after all, this is 
not true even for Wilkinson's shift. We need a more careful statement. 

Lemma 5.2 Consider a shift strategy a and ei nv as in Proposition \5.1[ For a 
compact set K, C T>\ e . s (T>\ n S a ), there exists Cjc such that for all T G K we 
have \a(T)-Xi\< C*xT(T) 2 . 

Proof: Let K-v = Kf\ T>\ ; enlarge JCx> along T>\ to obtain another compact 
set /Ci C X> A0 \ S a , JCt> C intp^ o (/Ci). Fatten K\ along fibers to define K\ = 

C _1 (/Ci x [— e, e]), e € (0, £i nv ), which, without loss, still avoids S a . For each Tp G /Ci, 
consider the function hx^ib) = <7(£ _1 (Tx>, 6)), obtained by restricting a to a fiber of 
T>\ c . Each hr-c is smooth and even and therefore satisfies \hr v (b) — Xi\ < Ctj, \b\ 2 . 
By compactness, there exists C/c 1 such that \tiTj, (b) — Xi \ < \b\ 2 for all 7x> G K\. 
In other words, there exists such that \a(T) - A;| < C^Jfe(T)| 2 for all T G /Ci. 

The estimate for T ^K,\ is trivial. ■ 

Proof of Theorem [2} Take e = e; nv as in Proposition 15 . 1 1 so that T>\ is invariant 
under F a . 

Let (p — b o We compute the Taylor expansion of <^(T, s) at (7p, Aj), Tp G 
I?\ : from Lemma 12 .31 the gradient of (p at (Tp,Aj) is zero. Thus, up to a third 
order remainder, 

<p{T,s) = Lp(T Vl X t ) + ^Pt,t{Tt>, Xi)(T - T V ,T - T v )+ 

+ tp T , s {Tv,Xi)(T -T v ,s- Xi) + ^tp s , s (T v , Xi)(s - A,,s- AJ + 
+ Rem 3 (T - T v , s - Xi). 
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Now, ip(Tx>, Xi) = and, again from Lemma ip(T, A,) = for all T £ 7a, hence 
<Pt,t(1t), Xi) — 0. Let C a be the constant in the definition of a simple shift strategy. 
By compactness, there exists C\ > such that for all T-p £ ^Xoi T £ T>\ e and 
s £ [Aj — C a e, Xi + Ca e], we have 

s)| < C x \a - AiKUT - Tell + |s - A,|) 

We now apply this estimate for Tt> — Hi (T) , where T £ T>\ e . By Proposition 14. 2\ 
since e < e tu b, \\T - T v \\ = \\T - IU(T)\\ < C b \b(T)\ and therefore 

\<p(T,s)\ < d\s - Xi\(C b \b(T)\ + \s- Ai|) 

implying the quadratic estimate 

\b(F a (T))\ = \<p(T,a(T))\ < CMT) ~ X t \{C b \b{T)\ + \a(T) A,|) < C,|6(T)| 2 . 

Using Lemma [5721 yields the cubic estimate in (c). ■ 

As a corollary, we obtain the well known fact that, near deflation, the rate of 
convergence of Rayleigh's (as well as the mixed Wilkinson-Rayleigh) strategy has 
cubic convergence. The rate of convergence for Wilkinson's strategy is subtler. 

We construct a larger invariant set for F a . Let Ua C 7a be the set of unreduced 
matrices; for e > 0, let U\ tC — U\ U int(2?A.e)- Notice that is open, dense and 
path-connected. 



Lemma 5.3 For a shift strategy a : 7a — > R> £inv os in Provosition I5.il and e £ 
(0, einv); the open setU^a is invariant under F a . 

Proof: If T £ U\ and a(T) is not in the spectrum then F a (T) is (well defined 
and) unreduced. If T £ U A and a(T) = A t then F a (T) £ V\ C U A>e . Finally, if 
T £ int(DX,e) then > b y Proposition EU F a (T) £ int(X>\ e/2 ) c'w A>e . ■ 

Notice that we do not assume a or F a to be continuous. This shows that for F„ 
defined from a simple shift strategy a the extra hypothesis in Theorem [1] item (b), 
actually holds: just take Id = V\ e . 

A simple shift strategy a is deflationary if for any T £ W A , ejm , there exists K £ N 
such that F^ (T) £ 2?A,e inv . It is now a corollary of TheoremQ]and Lemma [5731 that 
continuous simple shift strategies are not deflationary. 

Rayleigh's strategy is known not to be deflationary. The following well known 
estimate ([7] and [14], section 8-10) implies that Wilkinson's strategy is not only 
deflationary but uniformly so, in the sense that there exists K with F^f (UA,e inv ) C 
2?A,e inv - As a corollary, the mixed Wilkinson-Rayleigh strategy is also uniformly 
deflationary provided e > is sufficiently small. 



Fact 5.4 ForT £T and k £ N, 

|6( ^ (T))I - — (TW 1 — ■ 

In [14], the result is shown for unreduced matrices; the case T £ UA : e iav follows by 
taking limits. Notice that for T £ 7a, the numerator |6(T) 2 (T)„_i jn _2| is uniformly 
bounded. 
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6 Dynamics for a.p. free spectra 



From the previous section, cubic convergence may be lost when the orbit F£(T) 
passes near the set S a H Pa,o- 0m next task is to measure when this happens, by 
studying the dynamics associated to a shift strategy in a deflation neighborhood, 
i.e., the iterates of F a : T> A e — » T>\ e , e G (0, e inv ). Most of what we need can be 
read in the projection onto T>\ , where F a coincides with F\ i . 

A matrix T G T with simple spectrum is a.p. free if no three eigenvalues are 
in arithmetic progression and a.p. otherwise. Different kinds of spectra lead to 
different dynamics: in this section we handle the a.p. free case, clearly a generic 
restriction. Let T be the leading principal (n— 1) X (n— 1) minor of T. The following 
result is standard. 

Proposition 6.1 Let A G T be an n x n diagonal a.p. free matrix with spectrum 
X\ < ■ ■ ■ < X n . For each i, consider F\ i : T> A Q — > T> A as above. For any T G T> A , 
the sequence (F^. (T)) converges to a diagonal matrix. 

Proof: The map F\ i on T>\ amounts to a QR step with shift A,; on T, which has 
eigenvalues Xj, j ^ i. The a.p. free hypothesis implies that the absolute values of 
the eigenvalues of T — \I are distinct. If T is unreduced then, as is well known, 
the standard QR iteration converges to a diagonal matrix, with diagonal entries in 
decreasing order of absolute value. More generally, if T is reduced, apply the above 
result to each unreduced sub-block. ■ 

We shall use height functions for the QR steps F Sl s near Xi, i.e., functions 
Hi : V Ae -> R with Hi(F„(T)) > H{T) provided T is not diagonal. Such height 
functions and related scenarios have been considered in [T], 0], [TT] and [16] , 

The matrix W = diag(w;i, . . . , w n ) is a weight matrix if Wi > • ■ ■ > w n . Since A 
is a.p. free, there exists e ap € (0, e; nv ) such that if s G li = [Xi — e ap , A, + e a p] then 
the numbers \Xj — s\ are distinct and their order does not depend on s. 

Proposition 6.2 Let A be an a.p. free diagonal matrix, W a weight matrix and 
e ap as above. For Sh > 0, set Li(x) — log((a; — A^) 2 + 6h) o,nd let Hi : T> l A e ^ — > K 
be defined by Hi(T) — tr(WLi(T)). There exists Sh > such that 

max H t (T) < min H t (T) 

and, for any s € Xi, Hi is a height function for F s : T> A e ^ —¥ T> l A e ^ . 

Here, L t (T) = X diag(£j(Ai), . . . , ^(A.^))^- 1 for T = XAX^ 1 so that if p is 
a polynomial and Li(Xj) = p(Xj) for j = 1, . . . ,n then Li(T) — p(T). The only 
conditions on Li which will be used in the proof are that \Xj — A$| < \Xk — Aj| implies 
Li(Xj) < Li(Xk) and that Lj(Aj) is very negative (for small Sh)- 

The proof requires some basic facts about steps (again related to the 

integrability of the Toda lattice) ; these facts will not be used elsewhere. For a real 
diagonal matrix A with simple spectrum, let Oa be the set of all real symmetric 
matrices similar to A; it is well known that 0\ is a smooth compact manifold. The 
f-Q+R* step applied to a matrix S G Oa is the map Ff : -4a./ — > Oa defined by 
Ff(S) — QISQ+, where Q± is obtained from the factorization f(S) — and 
S G A A j if and only if f(S) is almost invertible. If T G 7a H A A j then F f (T) G T A 
(use the same proof as in Lemma 12. 2|) . The maps F s : 7a — >• 7a defined above 
correspond to restrictions of Ff for f(x) = x — s. 

For a continuous function h : R — > M, if S G Oa then the matrix function h(S) 
belongs to Om, where M = h(A). With the obvious abuse of notation, we have a 
diffcomorphism h : Oa — > Om provided h is injective in the spectrum of A. 
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Lemma 6.3 For h injective in the spectrum of A, consider the diffeomorphism 
h : Oa — > Om, where M = h(A). Let f and f be continuous functions defined in 
neighborhoods of the spectra of A and M , respectively, satisfying f(h(Xj)) = f(Xj) 
for each j with QR steps Ff : Oa —> 0\ and Fs : Om Cm- Then hoFf = F^oh. 

Proof: The hypothesis implies that, for T E O a , f{T) = f(h(T)) = QR and hence 
Ff(T) = Q*TQ and Fj{h{T)) = Q*h(T)Q. Thus h(F f (T)) = Fj{h{T)). U 

Let I r be the n x n truncated identity matrix, i.e., {I r )i,i = 1 for i < r, other 
entries being equal to zero. 

Lemma 6.4 Let M be a diagonal matrix with simple spectrum and f : K — > R be 
a function for which fj,i < fj,j implies \f(pi)\ < \f(pj)\. Consider the f -QR step 
Fj : A M i — > Om- For any S € A M i and r = 1, . . . , n — 1, tr(I r Ff(S)) > tv(L r S). 
For r = 1, equality only holds if {S)\j — for all j > 1. 

This argument follows closely the first proof in [3] . 

Proof: Let V r be the range of I r and fi r j(S) be the eigenvalues of the lead- 
ing principal r x r minor of S, listed in nondecreasing order. We claim that 
f x r,j(Ff('S)) > Vr,j(S), which immediately implies ti(L r Fj(S)) > tr(L r S). Recall 
that Ff(S) = QISQ* where Q+R+ = f(S). Let U be an upper triangular matrix 
such that Q+u = f(S)Uu for u 6 V r . By min-max, 

,a\ ■ (u,Su) 

u r j[b) — max mm — r-, 

AcV r uEA^{0} [U,U) 

dim(A)=r+l-j 

fpf^ ■ • (f(S)Uu,Sf(S)Uu) 
u r i\r f o = max mm — ; = max mm — = = 

J } A u ( U ,u) A u (f(S)Uu,f(S)Uu) 

. (f(S)u>,Sf(S)u>) 
= max mm — = = 

A'=ua u'eA'v{o} (f(S)u',f(S)u') 

Notice that since U is upper triangular, the map taking A C V r to A 1 = UA is a 
bijection among subspaces of V r of given dimension. Since S and f(S) are symmetric 
and commute, 

Vr,j{Fj{S)) = maxmm- , 

3 A u {u,g(b)u) 

where g(x) = (f(x)) 2 . The claim now follows from the inequality 

(u, u) (u, Sg(S)u) — (u, Su)(u, g(S)u) > 0. 
Diagonalize S = Q*MQ and g(S) = Q*g(A)Q and write Qu = (x±, . . . , x n ) so that 

2((u,u)(u,Sg(S)u) - (u,Su)(u,g(S)u)) = ^(^fc ~ Vi)(g(pk) - g(pi))x 2 k xj > 0. 

k.e 

Consider now equality for the case r = 1. Notice that, by hypothesis, if k =^ t then 
(fik — Mf)(5(/ i fe) — Qif^i)) > 0- I n the max-min formula for tr(JiS') = /ii j i(S f ), it 
suffices to take u = e±. Equality therefore holds only if Qe\ is a canonical vector, 
which implies (S)ij = for all j > 1. ■ 

Proof of Proposition l6T2t For all s Eli and any distinct eigenvalues Xj and 
\Xj - Ai| < |A fe - Ai| if and only if L^Xj) < Lj(A fe ). For s £ I,, f(x) = x - s, 
h(x) — Li(x) and fij = Lj(Aj), define / : IR — > M. as in Lemma [6731 The function / 
satisfies the hypothesis of Lemma 16.41 Uj < Uk implies \f(f/>j)\ < \f(fik)\- Thus, by 
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LemmaEH tr(WF f ~(S)) > tr{WS) for all S G O s . For T G T>\ e ^, take 5 = h(T): 
by Lemma[631 F?(h( T)) = h(F f (T)) and therefore tr(T¥/i(F/(T))) > tr(Wh(T)). 
Again by Lemma l6.4| equality happens only if T is diagonal. Thus, Hi is a height 
function. Finally, choosing Sh sufficiently small guarantees that Hi is large in T>\ 
and small in dT>\ completing the proof. ■ 

Thus, simple shift strategies admit height functions near the deflation set. Our 
reason for constructing a height function is to control the time the sequence (F*(T)) 
stays in a compact set. 

Assuming A to be a. p. free, for a shift strategy a : 7a — > K set e CT = e ap /(l + C a ) 
(where C a is the constant in the definition of a simple shift strategy). Notice that 
T e V \,t a implies a(T) G li = [A* - e ap , \ + e ap ]- 

Corollary 6.5 Let A be a real diagonal n x n a. p. free matrix, a a simple shift 
strategy and T>\ c ^ as above. Let IC G T>\ e ^ be a compact set with no diagonal 
matrices: there exists K G N such that for all T G T>\ e ^ there are at most K points 
of the form F^(T) in IC. 

The plan is to take K, containing S a fl T>\ e ^ : the hypothesis in Theorem [3] that 
diagonal matrices do not belong to the singular support S a is then natural. 
Proof: Let m_ be the minimum jump in IC and m + the size of the image of Hi'. 

m_= inf Hi(F s {T)) - Hi(T), m+ = sup H^T) - inf H t {T). 



By Proposition 16.21 and the compactness of K X li, s > 0: take K such that 
Km- > m + . For a given T, let X = {k G N | F*(T) G IC}: we have 

m + > Hi(^ +1 (T)) Hi(F^(T)) > |X|m_ 

and therefore |X| < K. ■ 

Proof of Theorem [3j Let /Ci,/C2 C P\ e<j be compact sets with K\ U/C2 = ^a, £ct j 
tScr n T> l A0 disjoint from K-i and with no diagonal matrices in £2- By Theorem [21 
there exists C Kl > such that \b(F a (T))\ < C Kl \b(T)\ 3 for all T £ K\. By Corollary 
16.51 there exists K2 G N such that, given T G T>\ e , at most K2 points of the form 
F*(T) belong to K, 2 - In particular, there are at most K2 values of k for which the 
estimate |&(F* +1 (T))| < C Kl \b(F*(T))\ 3 does not hold. ■ 



7 Convergence rates for a.p. spectra 

The aim of this section is to prove Theorem HI An a.p. matrix T G T with simple 
spectrum is strong a.p. if three consecutive eigenvalues are in arithmetic progression 
and weak a.p. otherwise. 

In the a.p. free case discussed in the previous sections, for an initial condition 
T G T>\ t , the sequence F„ (T) converges to a diagonal matrix; this follows from the 
fact that <r(T) w \ for T G T>\ . For weak a.p. spectra, convergence to a diagonal 
matrix may not occur. 

Assume A to be weak a.p. Let &2(r) = Tn-i.n-2 be the second-last subdiagonal 
entry; for consistency, write b\{T) — b(T). For any i, there exists a unique index 
c(i) such that \ c (i) is the eigenvalue closest to A^. As we shall see, if T G T>\ e then 

lim h(F*(T)) = lim b 2 (F*(T)) = 0, lim (F*(T))„,„ = \- 

k— »oo k— J-oo k— ¥00 
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furthermore, if T is unreduced then 

lim (i^(T)) n _i „_x = X c (i). 

We begin with a technical lemma concerning the dynamics of steps F s . Item 
(b) is a variation of the power method argument used to study the convergence of 
lower entries under QR steps. 

Lemma 7.1 Let M — diag(/ii, . . . , /i m ) be a real diagonal matrix with simple spec- 
trum and Tm C T be the manifold of real mx m tridiagonal matrices similar to M . 
Let L C R be a compact interval. Assume that there exists j , 1 < j < m, such that 

Hi £ I, max|/i,- — s| < min — s\. 
Let T> J M e C 7m be the j-th deflation neighborhood. 

(a) There exist e > and C G (0, 1) such that for all e' G (0, e) and s G I we have 
F s (Vi Ie ,)cVi ICe ,. 

(b) Consider Tq G 7m unreduced, a sequence (sk) of elements of L and e > 0. 
Define Tk+i — F Sk (Tk). Then there exists k such that T& € "Dm e- 

This will be used to study b 2 (T) for T € £>\ e , setting 7 = [A;-e, Ai + e], j = c(i), 
M = Aj = diag(Ai, . . . , Ai_i, Aj+i, . . . , A n ), with the natural identification between 
7m and T>\ . 

Proof: Let C G (0,1) be such that 

max — s\ < C min Im. — si. 

sEl k=£j,s£l 

Write 

\ 7X* ) m — 1 , m — 1 



Recall from Lemma [2J2] and Corollary S3] that b(F s (T)) = r(s,T) 6(T). We claim 
that for all T G £> M0 and s G 7, |r(s,T)| < C. Since T G X> M0 , \(R*) m , m \ = \fJ.j-a\. 
Let R- be the leading principal minor of i?+ of order m — 1: its singular values are 
|/ife — s|, 7^ s. In particular, all singular values are larger that \(R it ) rn , tm \/C. Thus 

,_„„* D 1 1 \ l(^) 

m,m | 

Kil*Jm-l,m-l| = ll e m-l- K -ll ^ ll e ™-l| 



c c 

proving our claim. Take C = (1 + C)/2: by continuity, for sufficiently small e > 0, 
we have \r(a,T)\ < C for all T G V j M e , s G I. Thus, for T G £> Me and s G 7, 
|6(^ s (r))| < C|6(T)|; item (a) follows. ' 

For item (b), write T k+ i = Q* k T k Q k where T k - s k I = Q k R k is a de- 
composition. Notice that, by hypothesis, I is disjoint from the spectrum so that 
Tq — sqI is invertible. We have (T — so/) -1 = i? _1 Qo so the rows of Qg are obtained 
from those of (To — sqI) by Gram-Schmidt from bottom to top. In particular, 
Qo£m = cq(Tq — sq7) e m , Co > 0. More generally, we claim that 

Pke m = c(T - Sk-iiy 1 • • • (T - Si7) _1 (T - s 7) _1 e m , 

c>0, ft = Qo<3i • ■ ■ Qk-i G SO(m). 

Indeed, by induction and using that T\ — QqTqQq, 

Pke m = c'Q (Ti - Sk-iiy 1 ■ ■ ■ (7\ - si/) _1 e m 
= c'(T - s fc _i/) _1 ■ • • (T - s 1 7)~ 1 Q e m 
= c(T - Sk-iiy 1 • • ■ (T - si/) _1 (T - s iy 1 e m . 
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(Integrability of the Toda lattice is present here yet another time.) For a — 1, ... ,m, 
let v a be the unit eigenvector associated to \i a . We claim that 



lim P k e rn = ±Vj. 

k— ¥00 



Indeed, write e rn — Y^a=i a a v a> where a a = (v a ,e m ) is the last coordinate of v a . 
It is well known that the last coordinates of the eigenvectors v a of the unreduced 
matrix T arc nonzero: in particular, a,j 7^ 0; assume without loss dj > 0. We have 

Pk& m = c(T - s fc _i/) _1 • • • (T - Sx/) _:L (T - s iy 1 e m 

£ ,w„, = c M v * E bk >« v ° ' 



jrj (Ma - sfc-i) ■ • • (m<* - s o) 

a a /Xj — s fc _i fij — s 
Ck > U, o fc , a = • • • . 

dj fl a — Sk-l Ma — S 

Since \fij — Sfc_i|/|/x Q — s/fe 1 1 < C we have |&fc, a | < (C') fc |a a /a.,-| and therefore 

limfc_ ) . 00 bk, a = 0, proving the claim. We have 

lim b(T k ) = lim (T fc ) m .m-i = lim e* n _{T k e m = lim (P fc e m _i)*T (P fc e m ) = 

>-oo k— too k^-oo k— too 

= lim (P k e m -x)* Hj{P k e m ) + lim (P k e m -i)*(T - n 3 I){P k e m ). 

k— ¥00 k^roc 

The first limit in the last expression is zero because P k e m -i _L P k e m ; the second is 
zero because P k e m -\ is bounded and 

lim (T - fijI)(P k e m ) = (T - fijl) lim (P k e m ) = (T - fijl)vj = 0. 

k— too k—too 



Consider the double deflation set Ca.o C 2?a,0 C 7a: 

Ca.o = {TeT A \ 61 (T) = b 2 (T) = 0}. 

For Wilkinson's strategy u>, it turns out that the set Ca,o is disjoint from the singular 
support Soj. More generally, if a shift strategy a satisfies Ca.o n 6> CT = then cubic 
convergence of F a holds even for weak a. p. spectra: this is Theorem which we 
prove below. 

In [9j , we show examples of unreduced tridiagonal 3x3 matrices with spectrum 
— 1, 0, 1 for which Wilkinson's shift converges quadratically to a reduced but not 
diagonal matrix in the singular support S u . Similarly, we conjecture that for strong 
a.p. diagonal n x n matrices A there exists a set X C 7a of Hausdorff codimension 
1 of unreduced matrices T for which F k (T) converges quadratically to a matrix in 
Su, n £>a,0 with T n _i )n _ 2 7^ 0. 

With the natural identification between T>\ and T Ai , we may consider X> A . ea 
to be a subset of V\ Q . Let 

For small ei, ea > 0, T € C^'* e implies 

I'n-i.n-i « Aj, T ntH KiXi, bi(T) < ei, 6 2 (T) ss 0. 



These compact sets turn out to be manifolds with corners but we shall neither prove 

'A,e 2 ,ei • 



nor use this fact. Lemma [77X1 can be rephrased in terms of the sets Cl' 1 
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Corollary 7.2 Let A to be weak a.p. spectrum and a be a simple shift strategy. 
There exists e > such that, for all i and for all e± £ (0, e): 

(a) there exists C £ (0,1) such that, for all sufficiently small £2 > we have 

P /V>c(i),i \ <- r c(i),i 

(b ) for all unreduced T £ T>\ c and for all t\ , £2 > there exists k such that (T) £ 

Proof: Combine Lemma [7TT1 with IF o F s — F s o IF (Proposition [4TTJ) . ■ 
Proof of Theorem [4| From the hypothesis that Ca.o and S a are disjoint it follows 
that, for sufficiently small £i,£2 > 0, the shift strategy a is smooth in C^eaVr ^ s 
in Lemma 15.21 from a Taylor expansion around T £ T> A , there exists C2 such 
that |cr(T)| < C 2 \b 1 (T)\ 2 for all T £ £0^. As in the proof of Theorem dj there 
exists C 3 such that |6 x (i^(r))| < C 3 |foi(T)| 3 for all T £ . From item (a) 

of Corollary 17.21 C^^'V, is invariant under F a ; from item (b), for all unreduced 
T £ T>\ (where e is sufficiently small) there exists K such that, for all k > K, 
F%(T) £ C^ea'e > completing the proof. ■ 



8 Two counterexamples 

In this section we present two examples which show that natural strengthenings of 
Theorems [3] and |4] do not hold for Wilkinson's strategy uj. 

We use the notation of Section 3. In Figure HI where A = diag(l,2,4), we 
indicate a sequence F™ (T) which enters the deflation neighborhood T>\ e near one 
diagonal matrix but travels within the neighborhood towards another diagonal ma- 
trix. Thcorcm[2]guarantees the cubic decay of the (3, 2) entry whenever F*(T) stays 
away from the singular support S u . Consistently with Theorem [31 this happens for 
practically all values of k. Notice however that no uniform bound exists on the 
number of iterations needed to reach (a neighborhood of) S u . As proved in [5], in 
this instance cubic decay does not hold. More precisely, it is not true that given an 
a.p. free matrix A there exist C > and K such that |6(i^' +1 (T))| < C\b(F%(T))\ 3 
for all k > K. 




S 



Figure 4: We may have F*(T) £ for large values of k. 
Consider now the weak a.p. spectrum A = diag(— 1, 0, 0.3, 1) and 

To= (°0 3 S q ) ETa 

where So £ 7a 3 , A3 = diag(— 1, 0, 1), is an example of unreduced matrix obtained 
in [5] for which convergence is strictly quadratic, i.e., 

C-I^S-o))! 2 < |&(^ +1 (Sb))| < C + \b(F*(S ))\ 2 , 
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for all k, where < C_ < C+. Trivially, the analogous estimate holds for b(F™(To)). 
By sheer continuity, given K, there exists e > such that if T E 7a satisfies 
||T-T || < e then 

C_|6(F*(T))| 2 < \b(F^(T))\ < C + \b(F*(T))\ 2 

still holds for all k < K. Thus, the uniform estimate in Theorem [3] fails for weak 
a. p. spectra, even for unreduced matrices. 

9 Appendix: Proof of Theorem [I] 

Recall from Section 2 that Ej £ £ is defined by 

the involutions r/j are defined by rjj(T) = EjTEj, which differs from T only in the 
sign of the j-th subdiagonal coordinate. Let Mj C 7a be the mirror, i.e., the set 
of fixed points of r)f for T e 7a we have T S Mj if and only if (T)j + \j = 0. Let 
S n be the symmetric group of permutations ir of the set {1, 2, . . . , n}. For ir e S„, 
let Mj^ C Mj be the set of matrices for which the eigenvalues of the top j x j 
principal subblock are X n m, . . . , A^j so that 

Mj.ir « 7diag(A„ (1) ,...,A„ w) ) x 7diag(A„ (J+1) ,...,A„ (n) ) ■ 

Thus, A^j is a submanifold of codimension f with (™) connected components A^j l7r . 
The diagonal matrices in 7a are labeled by ir g S„: let 

A ff = diag(A 7r(1) , . . . , A 7r( „)). 

Let J7a C 7a be the set of tridiagonal matrices with nonnegative subdiagonal entries. 
The set J7a is homeomorphic to the permutohedron V\ ([16jL the convex hull of the 
points 

v-k = (A^-i(i), . . . , Xtt-i^)) € K™, 7T G 5 n ; 

the vertices of Pa are 'y-n-- An explicit homeomorphism takes T = Q*AQ to the 
vector in R™ whose j-th coordinate is (QAQ*)jj; this map takes A T to u ff ([2]). 
We use this map to endow with a combinatorial structure of vertices, faces and 
hyperfaces: in particular, vertices of J7a are diagonal matrices. It turns out that 
the hyperfaces of Ja are the intersections Mj^ n Ja- 

Lemma 9.1 Let V C M. n be the convex hull of a finite set. Let F : V — > V be a 
continuous function. Assume that for any hyperface Q C V we have F(Q) C Q. 
Then F is surjective. 

Proof: The dimension of a convex subset of K ra is the dimension of the affine 
subspace spanned by its vertices. Notice that any face (of any dimension) is the 
intersection of hyperfaces and therefore also invariant under F . 

We use relative homology: if the dimension of V is d then HdiV, dV) = Z; wc 
prove that F* : Hd(V,dV) — ¥ Hd{V,dV) is the identity. This implies the lemma: 
if is an interior point of V not in the image of F then since Hd(P,V \ {a;o}) = 
Hd{V,dV) we have F» = 0, a contradiction. 

The proof of the claim is by induction on the dimension d of V . The case d = 
is trivial; in the case d = I the polytope V is an interval and F takes each endpoint 
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to itself and again the claim is easy. In general, let Q be a hyperface of V so that 
the dimension of Q is d — 1 and, by induction, : H d -i{Q,dQ) —> H d -i(Q,dQ) 
is the identity. We have Q c dV and H d ^(Q,dQ) = H d - t (dP,dQU {dV \ Q)) = 
H d -i(&P) and therefore F* : H d -^{&P) -> H d ^{dV) is the identity. Since V 
is contractible, the long exact sequence for relative homology implies that : 
H d (V,dV) —> H d (V,dV) is the identity, completing the proof. ■ 

Proof of Theorem [1} For (a), first notice that the condition Forji — rjioF implies 
F(A4i) C A4i. Since diagonal matrices are fixed points this implies F(M.i >g ) C 
Aii.g. Restrict F to Sa an d drop signs to define a continuous map F : J a — > J a 
which keeps each hyperface of Ja invariant. By Lemma 19.11 F is surjective and 
therefore (by equivariance) so is F. 

For (b), let B l C 7a be the basins of attraction of each invariant neighborhood 
int(/Cj), i.e., T G B i if there exists k e N such that F*(T) G int(K^). The sets 
23* are clearly disjoint with /Q C They are also open subsets of Ta since 

Bi = {J k F~ k (mt(K,i)). Since Ta is connected there exists T ^ Ui ^* anc ^ we are 
done. ■ 
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